Establishment and validation of a heart failure risk prediction model for elderly patients after coronary rotational atherectomy based on machine learning

Objective To develop and validate a heart failure risk prediction model for elderly patients after coronary rotational atherectomy based on machine learning methods. Methods A retrospective cohort study was conducted to select 303 elderly patients with severe coronary calcification as the study subjects. According to the occurrence of postoperative heart failure, the study subjects were divided into the heart failure group (n = 53) and the non-heart failure group (n = 250). Retrospective collection of clinical data from the study subjects during hospitalization. After processing the missing values in the original data and addressing sample imbalance using Adaptive Synthetic Sampling (ADASYN) method, the final dataset consists of 502 samples: 250 negative samples (i.e., patients not suffering from heart failure) and 252 positive samples (i.e., patients with heart failure). According to a 7:3 ratio, the datasets of 502 patients were randomly divided into a training set (n = 351) and a validation set (n = 151). On the training set, logistic regression (LR), extreme gradient boosting (XGBoost), support vector machine (SVM), and lightweight gradient boosting machine (LightGBM) algorithms were used to construct heart failure risk prediction models; Evaluate model performance on the validation set by calculating the area under the receiver operating characteristic curve (ROC) curve (AUC), sensitivity, specificity, positive predictive value, negative predictive value, F1-score, and prediction accuracy. Result A total of 17.49% of 303 patients occured postoperative heart failure. The AUC of LR, XGBoost, SVM, and LightGBM models in the training set were 0.872, 1.000, 0.699, and 1.000, respectively. After 10 fold cross validation, the AUC was 0.863, 0.972, 0.696, and 0.963 in the training set, respectively. Among them, XGBoost had the highest AUC and better predictive performance, while SVM models had the worst performance. The XGBoost model also showed good predictive performance in the validation set (AUC = 0.972, 95% CI [0.951–0.994]). The Shapley additive explanation (SHAP) method suggested that the six characteristic variables of blood cholesterol, serum creatinine, fasting blood glucose, age, triglyceride and NT-proBNP were important positive factors for the occurrence of heart failure, and LVEF was important negative factors for the occurrence of heart failure. Conclusion The seven characteristic variables of blood cholesterol, blood creatinine, fasting blood glucose, NT-proBNP, age, triglyceride and LVEF are all important factors affecting the occurrence of heart failure. The prediction model of heart failure risk for elderly patients after CRA based on the XGBoost algorithm is superior to SVM, LightGBM and the traditional LR model. This model could be used to assist clinical decision-making and improve the adverse outcomes of patients after CRA.


INTRODUCTION
Coronary artery calcification (CAC) is the sclerosis caused by calcium salt deposition in the coronary artery, which is common in patients with coronary heart disease, with a high incidence and case fatality rate (Junwei & Wei, 2021;Rui et al., 2023).For patients with severe calcification, percutaneous coronary intervention (PCI) treatment may face significant challenges, such as vascular detachment, obstructed stent delivery, and insufficient stent opening (Xiaonan et al., 2021).In addition, coronary calcification can also lead to myocardial perfusion damage and polymer coating failure on drug eluting stents (Guijun & Qi, 2019).Therefore, timely detection and management of coronary artery calcification should be carried out before PCI surgery to avoid the occurrence of adverse cardiac injury events.
Coronary rotational atherectomy (CRA) is a method to treat severe coronary artery calcification.The calcified tissue is grinded into tiny particles by using a high-speed rotating rotary head, so that the human Phagocyte system can engulf (Qian, Tao & Jun, 2020;Uetani & Amano, 2018).With the popularization of PCI surgery in China, the age of patients participating in interventional therapy is generally older, and the number of patients with severe coronary artery calcification symptoms has significantly increased.Therefore, the application of CRA for vascular pretreatment has become very important (Jichun et al., 2020).Research has found that elderly patients often suffer from multiple underlying diseases and decreased cardiovascular function, which increases the risk of adverse cardiac events after CRA.Among them, heart failure is one of the common adverse events and one of the main causes of postoperative death for patients (Zhengwei et al., 2023).Therefore, early prediction of heart failure after coronary rotational atherectomy is of great significance for the prognosis of patients.
With the improvement of computer performance and optimization of algorithms, machine learning has been widely applied in clinical research.Over the past decade, many research teams have attempted to create different machine learning models for predicting the risk of clinical related events (Lauritsen et al., 2021;Cilloniz et al., 2023).However, there is currently no progress in machine learning related research on predicting the risk of heart failure in patients after CRA.Therefore, this study will take patients after CRA as the research object, use the three most widely used Ensemble learning algorithms to establish the prediction model of patients' postoperative heart failure risk, and compare it with the traditional logistic regression model, in order to provide new ideas for clinicians to identify high-risk patients with heart failure early and carry out precise intervention.

Research subject
This study is a retrospective cohort study, in which 303 elderly patients with severe coronary calcification were selected consecutively as the study subjects, and the time span is from June 2017 to June 2021.These patients were hospitalized in the Department of Cardiology of a tertiary hospital in Anhui Province, China and underwent CRA.Inclusion criteria of research subjects: (1) Over 60 years old.(2) All patients met the indications for CRA treatment, patient or family members have signed the informed consent form for the surgery.(3) Diagnosed with coronary intimal calcification at grade IV, with strong echoic masses along the vessel wall and a maximum calcification arc of >270 , as confirmed by intravascular ultrasound examination.Exclusion criteria of research subjects: (1) Presence of ulcerative or thrombotic lesions that can be exacerbated by rotational atherectomy.
(2) Presence of chronic occlusive lesions.(3) Severe mental illness and inability to communicate normally.(4) Lesions that are prone to thrombosis and embolism, such as degenerative saphenous vein bridge lesions.(5) Presence of intimal tearing lesions that can be worsened by rotational atherectomy.(6) Severe angulation (>60 ) lesions.( 7) Acute myocardial infarction in the acute phase (≤7 days).( 8) Severe functional impairment of vital organs such as liver, lungs, kidneys, etc.The standard for coronary artery calcification lesions is that after completing coronary angiography, the stenosis of blood vessel diameter exceeds 50% and the reference diameter is greater than or equal to 1.5 millimeters (Zhengwei et al., 2023).

Data collection
Retrospective collection of postoperative clinical data of the research subjects through the Hospital information system (HIS).The data included gender, age, complications (chronic renal insufficiency, ischemic cardiomyopathy, cerebrovascular disease, hypertension, diabetes), left ventricular ejection fraction (LVEF), N-terminal pro brain natriuretic peptide (NT-proBNP), serum creatinine, fasting blood glucose, Glycated hemoglobin, preoperative creatine kinase isoenzymes (CK-MB), preoperative troponin, hemoglobin, total cholesterol (TC), triglyceride (TG), low density lipoprotein cholesterol (LDL-C) and very low-density lipoprotein cholesterol (VLDL-C), 19 indicators in total.Calculated the incidence of postoperative heart failure in subjects, refer to the diagnostic criteria in the "Chinese Guidelines for the Diagnosis and Treatment of Heart Failure 2018", Jianjun (2019) and diagnosed heart failure patients through NYHA grading, cardiac ultrasound, and other examinations.According to the occurrence of postoperative heart failure, the study subjects were divided into two groups, with 53 cases in the heart failure group and 250 cases in the non-heart failure group.This study was approved by the Medical Research Ethics Committee of the First Affiliated Hospital of University of Science and Technology of China (approval number: 2021-RE-026).Due to the retrospective nature of the study, the informed consent of the study object was exempted.

Data cleaning and pre-processing
Missing data can make analysis more difficult and may lead to deviations in the analysis results, thereby reducing accuracy (Yun et al., 2023).In order to deal with missing data, Random forest, an integrated classifier based on decision tree, can be used.It has strong anti noise ability, is not easily affected by outliers, and does not limit the type of data distribution (Xiaoqin & Yuying, 2017).If the missing value of a variable exceeds 20%, it needs to be excluded from the final dataset (Zhang et al., 2023).In this study, the random forest regression method was used to impute data for indicators with the percentage of missing data less than 20%, and the median was used to replace the outlier in the dataset.In addition, all continuous variables in our study have undergone Z-score normalization so as to scale each continuous variable to a distribution with mean 0 and standard deviation 1, while categorical variables have been processed with one-hot encoding.The purpose of one-hot encoding for categorical variables is to convert non-numeric category data into a numerical form that machine learning models can understand.Meanwhile, the purpose of Z-score normalization for continuous variables is to eliminate the dimensional inconsistencies and centralize the data, ensuring that the scales of different features are consistent, thereby enhancing the efficiency and accuracy of model training.In binary classification tasks, when the ratio of negative to positive samples in the dataset approaches 1:1, it can effectively avoid bias introduced by the samples.Therefore, to enhance the ultimate predictive performance of the model, this study employs adaptive synthetic sampling (ADASYN) technology to mitigate the issue of class imbalance.Compared to traditional random oversampling methods, ADASYN not only achieves a balance between negative and positive samples but also reduces the occurrence of overfitting (He et al., 2008).After processing the missing values in the original data and addressing sample imbalance, the final dataset consists of 502 samples: 250 negative samples (i.e., patients not suffering from heart failure) and 252 positive samples (i.e., patients with heart failure).

Model construction
In order to improve the generalization ability of the model and reduce the occurrence of overfitting, this study referred to relevant research (Wang et al., 2021) and used the LASSO regression analysis to screen for meaningful variables, which were included in the model construction.The dataset of 502 study subjects was randomly divided into a training set (351 samples) and a testing set (151 samples) using a 7:3 ratio.This randomization was achieved through the combined application of various modules within SPSS 25.0, including the Random Number Generation, Compute Variable, Rank Cases, and Select Cases modules, and four algorithms were used to construct a heart failure risk prediction model, namely the traditional logistic regression model (LR), extreme gradient boosting model (XGBoost), Support vector machine (SVM) and lightweight gradient boosting machine (LightGBM).The last three algorithms are the most widely used Ensemble learning models, and they can combine multiple learners to obtain better generalization performance.The testing set is used to validate and evaluate the performance of the selected model.

Model evaluation
Evaluate the predictive performance of various machine learning models using multiple evaluation indicators, including area under the receiver operating characteristic curve (ROC) curve (AUC), sensitivity, specificity, positive predictive value, negative predictive value, prediction accuracy, and F1-score (a weighted average of accuracy and recall) (Xin et al., 2023).At the same time, the clinical applicability of the model was evaluated using the decision curve analysis (DCA) curve, Jingjing et al. (2023) the calibration degree of the model is evaluated by the calibration curve and the interpretability of the model was evaluated using the Shapley additive explanation (SHAP) method (Ruihao et al., 2022).

Statistical analysis
Statistical software SPSS 25.0, R software (3.6.3;R Core Team, 2020), and Python software (3.7.0) were used for data analysis.M (P25, P75) is used to represent econometric data with skewed distribution, and Mann Whitney U test is used for inter group comparison.The frequency (%) was used to represent the counting data, and Pearson Chi-squared test was used to compare between groups.All test results were obtained from bilateral tests, and when P < 0.05, the difference was statistically significant.

Comparison of clinical data of patients
Among the 303 study subjects, 53 patients occured postoperative heart failure, with a heart failure incidence rate of 17.49%.Compared with the non-heart failure group, the heart failure group has a higher proportion of chronic renal insufficiency, a lower LVEF, and higher levels of NT-proBNP, serum creatinine, fasting blood glucose, blood cholesterol, triglyceride, and low-density lipoprotein cholesterol, with statistically significant differences (all P < 0.05), as shown in Table 1.

Comparison of training set and testing set
This study randomly divided the data of 502 patients into a training set and a testing set in a 7:3 ratio, consisting of 351 and 151 patients, respectively.There was no statistically significant difference in clinical data between the two groups of patients (P > 0.05), indicating that the two datasets were homogeneous and comparable (P > 0.05), as shown in Table 2.

Feature variable screening results
In the training set, the LASSO regression was used to screen the characteristic variables of 19 indicators, and the variable of non-zero regression coefficient corresponding to the Lambda coefficient of the minimum distance standard error was selected as the characteristic variable through 10 times cross validation.The LASSO regression results show that the Lambda coefficient of the minimum distance standard error (Lambda.1se) is 0.037, and the corresponding characteristic variables include triglyceride, age, blood cholesterol, fasting blood glucose, serum creatinine, NT-proBNP and LVEF; the Lambda coefficient of minimum mean square error (Lambda.min) is 0.014, and the corresponding characteristic variables include VLDL-C, triglyceride, blood cholesterol, HbA1c, fasting blood glucose, creatinine, NT-proBNP, LVEF, chronic renal insufficiency, ischemic cardiomyopathy, cerebrovascular disease, diabetes, age and gender (Fig. 1).

Model construction
In the training set, four algorithms including LR, XGboost, LightGBM and SVM, were used to build a machine learning prediction model including seven variables (triglyceride, blood cholesterol, fasting blood glucose, creatinine, NT-proBNP, LVEF, age) based on the corresponding characteristic variables of Lambda.1sefrom the LASSO regression, and 10 times cross validation was used to internally verify the built prediction model.The results showed that the XGBoost model had the highest AUC in the training set and the internal validation set, suggesting that XGBoost was the optimal model, as shown in Table 3 and Fig. 2.

Model performance evaluation
Testing the predictive ability of the XGBoost algorithm on the risk of heart failure in patients after CRA in the testing set.The results showed that the AUC of the XGBoost model in the testing set was 0.972, the prediction accuracy was 0.921, the sensitivity was 1.000, the specificity was 0.892, the positive predictive value was 0.911, the negative predictive value was 0.931, and the F1-score was 0.954.See Fig. 3A for the receiver operating characteristic of XGBoost model in the testing set.The clinical applicability of the XGBoost model was evaluated using the DCA curve in the testing set.The results of the DCA curve showed that when the threshold probability of heart failure in patients was between 1-90%, the net benefit of using the XGBoost model for risk assessment was the highest, significantly better than the "full intervention" and "no intervention" schemes.This suggests that the XGBoost model has good clinical applicability, as shown in Fig. 3B.The calibration curve in the testing set suggested a good agreement between the predicted probability of the XGBoost model and the frequency of postoperative heart failure, suggesting that the XGBoost model is well calibrated, as detailed in Fig. 3C.

Interpretability analysis of the model
In order to deeply explore the main influencing factors of heart failure occurrence and improve the interpretability of classification models, this study used the SHAP method to conduct interpretability analysis on the XGBoost model (Jiasi, 2023).Figure 4A showed the importance ranking of each feature variable in the XGBoost model.It can be seen from the figure that the order of importance is fasting blood glucose, NT-proBNP, blood cholesterol, serum creatinine, triglyceride, LVEF and age, and the importance of these seven characteristic variables is relatively concentrated, which can be regarded as important factors affecting the occurrence of heart failure., 2018).Therefore, it is necessary to assess the risk of heart failure as early as possible for patients after CRA and take corresponding preventive measures for those at high risk of heart failure, in order to prevent the deterioration of the condition and save the patient's life.However, the commonly used scoring tools currently require manual operation, which is time-consuming, labor-intensive, and inefficient.Accurate, convenient, and fast disease assessment can assist clinical decision-making, timely take treatment measures, and is of great significance for the risk assessment of heart failure and the prognosis of patients.
With the continuous development of algorithms and computer hardware, and the arrival of the era of Big data, machine learning has shown great advantages in mining and processing medical data.At present, it has been widely used in predicting the occurrence and prognosis of clinical diseases (Jie, 2022;Peng et al., 2022) such as predicting the risk of coronary heart disease (Jialun et al., 2023;Haoxuan et al., 2022) the risk of recurrence after radiofrequency ablation of atrial fibrillation (Huanxu et al., 2022) and adverse outcomes after acute coronary syndrome (D'Ascenzo et al., 2021).This study used LR, XGBoost, SVM, and LightGBM algorithms to establish risk prediction models for heart failure in elderly patients after CRA, and compared the predictive capabilities of these four models.At present, XGBoost, SVM and LightGBM are the three most widely used Ensemble learning algorithms.Among them, LightGBM and XGBoost mainly use the Boosting method, while SVM uses the nonlinear mapping theory to find the optimal plane to partition the feature space, fully consider the dependency between attributes, and add association arcs to expand the structure of the naive Bayesian model, thus significantly improving the classification effect (Yingying et al., 2022).LR is a classic regression analysis method commonly used to study disease risk factors and predict the probability of disease occurrence.However, Xiao et al. (2019) found in a study that compared to some machine learning algorithms, traditional LR has greater prediction errors and poorer prediction performance.The results of this study confirmed this conclusion, that is the AUC of the LR model in the training set and internal validation set are 0.891 and 0.865, respectively, and the accuracy is 0.790 and 0.760, both lower than the XGBoost model.The XGBoost model has better discrimination and can more accurately predict the risk of heart failure, demonstrating excellent performance in identifying patients at high risk of heart failure.
In addition, the univariate analysis results of this study found that compared with the non-heart failure group, the number of patients with chronic renal insufficiency was higher, the LVEF was lower, and the levels of NT-proBNP, serum creatinine, fasting blood glucose, blood cholesterol, triglycerides, and low-density lipoprotein cholesterol were higher, with statistically significant differences (all P < 0.05).This conclusion is consistent with the LASSO regression screening of six characteristic variables: blood cholesterol, triglycerides, serum creatinine, fasting blood glucose, NT-proBNP and LVEF, and with the SHAP method suggesting that blood cholesterol, serum creatinine, triglycerides, fasting blood glucose, NT-proBNP and LVEF are all important factors affecting the occurrence of heart failure.In addition, LASSO regression also suggests that age is one of the characteristic variables predicting the occurrence of heart failure after CRA, and the SHAP method also suggests that age is one of the important predictors of the occurrence of heart failure after CRA.Previous research findings are also similar with the results of this study, that is, LVEF < 45% and NT-proBNP ≥ 1,800 ng/L at admission are independent risk factors for heart failure after percutaneous coronary intervention (PCI) in patients with acute myocardial infarction (AMI) (Chenglong et al., 2022).High fasting blood glucose is a high risk factor for elderly hypertensive patients with heart failure (Jianfeng & Xiaoyan, 2017).Serum creatinine is an independent influencing factor for the long-term prognosis of chronic heart failure, and has an overall adverse effect on the long-term mortality rate of chronic heart failure (Houliang et al., 2022).Abnormal serum cholesterol levels may increase the risk of heart failure (Guoying, 2010).According to the study on the relationship between lipid levels and heart function in patients with chronic heart failure conducted by Hui & Xiaofang (2005) triglyceride is independently correlated with the occurrence and severity of heart failure, and triglyceride can be used as a reference indicator to determine the severity of heart failure, which is consistent with the results of this study, which found that triglyceride is an important predictor of the occurrence of heart failure in patients after CRA.Meili & Chunge (2023) found that age over 65 years old is one of the important factors affecting the occurrence of heart failure in patients with acute myocardial infarction during hospitalization, which is consistent with the finding in this study that age is an important predictor of the occurrence of heart failure in patients after CRA, suggesting that targeted and reasonable interventions should be carried out for elderly patients to reduce the incidence of heart failure.
This study also has some limitations.Firstly, it was a single center retrospective study conducted at a tertiary hospital in Anhui Province, China.Although the researchers have tried their best to retrospectively collect the clinical data of heart failure and non-heart failure after CRA in the elderly, the sample size is still smaller than some large studies, so more multicenter, large sample and prospective cohort study are needed to evaluate the performance of the XGBoost model.Secondly, this study wasn't validated by external datasets, and the promotion of the research results may be limited to some extent.Therefore, it is difficult to ensure the universality and promotion ability of the research results in other regions.Conducting multi-center validation research is particularly important for evaluating the model's generalization ability.Finally, our study is a retrospective, single-center study, and the limitations of the indicators collected retrospectively have resulted in the exclusion of specific factors related to coronary rotational atherectomy, such as lesion location (proximal, midshaft, or distal), maximum burr diameter, number of burrs used, vessel characteristics (presence of bifurcations, tortuosity, or complete occlusion), and target vessel involvement.These omissions may restrict the applicability of our findings to certain populations.Therefore, it is particularly necessary to conduct a multi-center, large-scale, prospective study that collects a broader array of research indicators.Such a study would not only allow us to assess the generalization ability and robustness of the predictive model constructed in our research but also to optimize the specificity and targeting of post-operative heart failure prediction models by including more specific indicators.

CONCLUSION
It is feasible to use four Ensemble learning algorithms LR, XGBoost, SVM and LightGBM to establish the risk prediction models of heart failure for patients after CRA.The XGBoost model has the best predictive performance.This model can identify high-risk postoperative patients with heart failure in advance, help medical staff make treatment decisions and adjust treatment plans, thereby reducing the occurrence of adverse outcomes.This has important clinical application significance for clinical medical workers.

Figure 1
Figure 1 Lasso regression analysis results.(A) Lasso regression coefficient diagram; (B) lasso regression cross validation statistical chart.The two vertical dashed lines in the chart represent the logarithmic Lambda coefficient of the minimum mean square error (dashed line on the left) and the logarithmic Lambda coefficient of the standard error of the minimum distance (dashed line on the right).Full-size  DOI: 10.7717/peerj.16867/fig-1

Figure 2 Figure 3
Figure 2 ROC curve of multiple models.(A) ROC curve in training set; (B) ROC curve in the internal validation set.Full-size  DOI: 10.7717/peerj.16867/fig-2 Figure 4B showed the distribution of SHAP values for each feature variable, sorted by the importance of each feature from top to bottom.The horizontal axis represents the SHAP value of the model, and the color of the points represents the size of the feature values.Red indicates a large feature value, while blue indicates a small feature value.A positive SHAP value indicates a positive contribution to the model's prediction of heart failure, while a negative SHAP value indicates a negative contribution to the model's prediction of heart failure.It can be seen from Fig.4Bthat fasting blood glucose has the greatest impact on the prediction results of the model, and with the increase of fasting blood glucose value, the probability of heart failure predicted by the sample will increase, that is, this feature has a positive impact on the prediction of heart failure, and the trend of NT-proBNP, blood cholesterol, serum creatinine, triglyceride and age is similar.The trend of LVEF is opposite to that of blood cholesterol.As the LVEF value increases, the probability of the sample being diagnosed with heart failure decreases.The SHAP method can not only analyze the overall influencing factors of the prediction model, but also analyze individual influencing factors(Ruihao et al., 2022).For example, we provide two typical examples to illustrate the interpretability of the model.One postoperative patient without heart failure had a low SHAP prediction score (0.00), as shown in Fig.5A; while the other postoperative patient who developed heart failure had a high SHAP score (0.98), as shown in Fig.5B.

Figure 4
Figure 4 Statistical plots of the SHAP analysis.(A) Order plot of variable importance for SHAP analysis; (B) statistical graph of variable contribution in SHAP analysis.Full-size  DOI: 10.7717/peerj.16867/fig-4

Figure 5
Figure 5 Example of SHAP interpretation in patients with heart failure.(A) Individual efforts by patients without heart failure; (B) individual efforts by patients with heart failure.Full-size  DOI: 10.7717/peerj.16867/fig-5

Table 1
Comparison of clinical data between two groups of patients.

Table 2
Comparison of training set and testing set.

Table 3
Comparison results of multiple models.